|
A statistical database is a database used for statistical analysis purposes. It is an OLAP (online analytical processing), instead of OLTP (online transaction processing) system. Modern decision, and classical statistical databases are often closer to the relational model than the multidimensional model commonly used in OLAP systems today. Statistical databases typically contain parameter data and the measured data for these parameters. For example, parameter data consists of the different values for varying conditions in an experiment (e.g., temperature, time). The measured data (or variables) are the measurements taken in the experiment under these varying conditions. Many statistical databases are sparse with many null or zero values. It is not uncommon for a statistical database to be 40% to 50% sparse. There are two options for dealing with the sparseness: (1) leave the null values in there and use compression techniques to squeeze them out or (2) remove the entries that only have null values. Statistical databases often incorporate support for advanced statistical analysis techniques, such as correlations, which go beyond SQL. They also pose unique security concerns, which were the focus of much research, particularly in the late 1970s and early to mid-1980s. == Security in statistical databases == In a statistical database, it is often desired to allow query access only to aggregate data, not individual records. Securing such a database is a difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual. Some common approaches are: * only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.) * rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k) * return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.) * don't allow overly selective WHERE clauses * audit all users queries, so users using system incorrectly can be investigated * use intelligent agents to detect automatically inappropriate system use Research in this area has largely stalled; reference 3 below showed that, in general, securing statistical databases was an impossible aim: if they were open to legitimate use, they were also open to abuse; and if they were restricted so tightly as to be incapable of abuse, they would then be useless for practical statistical purposes. To quote: :The conclusion is that statistical databases are almost always subject to compromise. Severe restrictions on allowable query set sizes will render the database useless as a source of statistical information but will not secure the confidential records.〔Dorothy E. Denning, Peter J. Denning, and Mayer D. Schwartz, "The Tracker: A Threat to Statistical Database Security," ''ACM Transactions on Database Systems (TODS),'' Volume 4, Issue 1 (March 1979), Pages: 76 - 96, .〕 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Statistical database」の詳細全文を読む スポンサード リンク
|